Logged in as: guest Log in
CEGS HiSeq Services
For help contact mnajarian@cs.unc.edu
2_5 Female X Chromosome Informative SNPs (Version 1) mcmillan / Version 34

This is the first version of Informative SNPs, which has been mentioned in May 24th meeting.

The tarball can be downloaded from here. Notice that the data set includes eSNPs and intergenic SNPs.

! A new version, Version 2, is available here.


Summary of the Informative SNPs dataset.(Figures only show eSNPs)

FG/GF

  Total eSNPs non-eSNPs
FG 2203 1899 304
GF 2401 2013 388
Intersection 2140 1856 284

FH/HF

  Total eSNPs non-eSNPs
FH 1843 1614 229
HF 1693 1486 207
Intersection 1664 1464 200

GH/HG

  Total eSNPs non-eSNPs
GH 2752 2177 575
HG 2256 1808 448
Intersection 2234 1789 445

 


 

Detailed Procedure:

(0) For each F1 cross, we only consider the Sanger SNPs between the maternal and the paternal inbred strains. For example, for FG cross, we only consider SNPs between CAST and PWK.

(1) We first use the F1 samples, and compute at each SNP position the ratio of the sum of maternal allele and paternal allele to the pileup height. We remove loci where this ratio is less than 25% in more than 25% samples. (To ensure each variant is mainly caused by maternal allele or paternal allele but not by random noise)

For example, for FG cross, we have 6 samples. We will remove a SNP if more than 1.5 samples(actually it is 1 in this case) has the ratio smaller than 25% at that position. We denote this intermediate output to be Set1.

 

(2) Then we consider the inbred strains and perform a similar procedure on both maternal and paternal inbred samples. However, the ratio and the criterion we used is different from the one in F1 cross. For maternal samples, we use the ratio of paternal allele to the pileup height and remove SNPs where more than 25% samples have the ratio greater than 25%. For paternal samples, we do it the other way around. (To ensure in maternal inbred strain each variant is mainly caused by maternal allele, and so as paternal inbred strain)

For example, for FG cross, we do one for FF samples, and the other for GG samples. We denote the intermediate output to be Set2 and Set3 for maternal and paternal respectively.

 

(3) The final output will be SNP positions either in Set1 and Set2 or in Set1 and Set3. Of course, SNPs that are in Set1, Set2, and Set3 are included. We will also remove SNP positions where less than 75% of samples have pileup height greater than 50. (To ensure each SNP is covered by enough pileup) This criterion is changed in Version 2.

 

In short, for FG cross, we remove the bad SNPs with the FG samples data, and we did the same procedure on its reciprocal cross GF. A SNP can be good in FG samples but bad in GF samples. 



Site built using pyWeb version 1.12
© 2010 Leonard McMillan, Alex Jackson and UNC Computational Genetics